© John Wiley & Sons, Inc.
FIGURE 19-5: Linear and exponential trends fitted to accident data.
Working with unequal observation intervals
In this fatal accident example, each of the 12 data points represents the accidents observed during a
one-year interval. But imagine analyzing the frequency of emergency department visits for patients
after being treated for emphysema, where there is one data point per patient. In that case, the width of
the observation interval may vary from one individual in the data to another. GLM lets you provide an
interval width along with the event count for each individual in the data. For arcane reasons, many
statistical programs refer to this interval-width variable as the offset.
Accommodating clustered events
The Poisson distribution applies when the observed events are all independent occurrences. But this
assumption isn’t met if events occur in clusters. Suppose you count individual highway fatalities
instead of fatal highway accidents. In that case, the Poisson distribution doesn’t apply, because one
fatal accident may kill several people. This is what is meant by clustered events.
The standard deviation (SD) of a Poisson distribution is equal to the square root of the mean
of the distribution. But if clustering is present, the SD of the data is larger than the square root of
the mean. This situation is called overdispersion. GLM in R can correct for overdispersion if you
designate the distribution family quasipoisson rather than poisson, like this:
glm(formula = Accidents ~ Year, family = quasipoisson(link = “log”))